摘要 :
This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of ...
展开
This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of ×4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of efficiency measured according to several metrics including runtime, parameters, FLOPs, activations, and memory consumption while at least maintaining the PSNR of 29.00dB on DIV2K validation set. IMDN is set as the baseline for efficiency measurement. The challenge had 3 tracks including the main track (runtime), sub-track one (model complexity), and sub-track two (overall performance). In the main track, the practical runtime performance of the submissions was evaluated. The rank of the teams were determined directly by the absolute value of the average runtime on the validation set and test set. In sub-track one, the number of parameters and FLOPs were considered. And the individual rankings of the two metrics were summed up to determine a final ranking in this track. In sub-track two, all of the five metrics mentioned in the description of the challenge including runtime, parameter count, FLOPs, activations, and memory consumption were considered. Similar to sub-track one, the rankings of five metrics were summed up to determine a final ranking. The challenge had 303 registered participants, and 43 teams made valid submissions. They gauge the state-of-the-art in efficient single image super-resolution.
收起
摘要 :
Fully Homomorphic Encryption (FHE) is a technique that enables arbitrary computations on encrypted data directly. Number Theoretic Transform (NTT) is a fundamental component in FHE computations as it allows faster polynomial multi...
展开
Fully Homomorphic Encryption (FHE) is a technique that enables arbitrary computations on encrypted data directly. Number Theoretic Transform (NTT) is a fundamental component in FHE computations as it allows faster polynomial multiplication. However, it is computationally intensive and requires acceleration for practical deployment of FHE. The latency and throughput of existing NTT hardware designs are limited by the complex data communication pattern between adjacent NTT stages and the modular arithmetic operations. In this paper, we propose a parameterized architecture for NTT on FPGA. The architecture can be configured for a given polynomial degree, modulus and target hardware in order to optimize the latency and/or throughput. We develop a novel low latency fully pipelined modular arithmetic logic to implement the NTT core, the key computational unit of NTT. Streaming permutation network is used to reduce the data communication complexity between NTT stages. We implement the proposed architecture for various polynomial degrees, moduli, and data parallelism on state-of-the-art FPGAs. Experimental results show that our architecture configured to perform 4096 polynomial degree NTT achieves up to 1.29× and 4.32× improvement in latency and throughput respectively over state-of-the-art designs on FPGA.
收起
摘要 :
Fully Homomorphic Encryption (FHE) is a technique that enables arbitrary computations on encrypted data directly. Number Theoretic Transform (NTT) is a fundamental component in FHE computations as it allows faster polynomial multi...
展开
Fully Homomorphic Encryption (FHE) is a technique that enables arbitrary computations on encrypted data directly. Number Theoretic Transform (NTT) is a fundamental component in FHE computations as it allows faster polynomial multiplication. However, it is computationally intensive and requires acceleration for practical deployment of FHE. The latency and throughput of existing NTT hardware designs are limited by the complex data communication pattern between adjacent NTT stages and the modular arithmetic operations. In this paper, we propose a parameterized architecture for NTT on FPGA. The architecture can be configured for a given polynomial degree, modulus and target hardware in order to optimize the latency and/or throughput. We develop a novel low latency fully pipelined modular arithmetic logic to implement the NTT core, the key computational unit of NTT. Streaming permutation network is used to reduce the data communication complexity between NTT stages. We implement the proposed architecture for various polynomial degrees, moduli, and data parallelism on state-of-the-art FPGAs. Experimental results show that our architecture configured to perform 4096 polynomial degree NTT achieves up to 1.29× and 4.32× improvement in latency and throughput respectively over state-of-the-art designs on FPGA.
收起
摘要 :
In this paper, we focus on the task of extracting visual correspondences across videos. Given a query video clip from an action class, we aim to align it with training videos in space and time. Obtaining training data for such a f...
展开
In this paper, we focus on the task of extracting visual correspondences across videos. Given a query video clip from an action class, we aim to align it with training videos in space and time. Obtaining training data for such a fine-grained alignment task is challenging and often ambiguous. Hence, we propose a novel alignment procedure that learns such correspondence in space and time via cross video cycle-consistency. During training, given a pair of videos, we compute cycles that connect patches in a given frame in the first video by matching through frames in the second video. Cycles that connect overlapping patches together are encouraged to score higher than cycles that connect non-overlapping patches. Our experiments on the Penn Action and Pouring datasets demonstrate that the proposed method can successfully learn to correspond semantically similar patches across videos, and learns representations that are sensitive to object and action states.
收起
摘要 :
In this paper, we focus on the task of extracting visual correspondences across videos. Given a query video clip from an action class, we aim to align it with training videos in space and time. Obtaining training data for such a f...
展开
In this paper, we focus on the task of extracting visual correspondences across videos. Given a query video clip from an action class, we aim to align it with training videos in space and time. Obtaining training data for such a fine-grained alignment task is challenging and often ambiguous. Hence, we propose a novel alignment procedure that learns such correspondence in space and time via cross video cycle-consistency. During training, given a pair of videos, we compute cycles that connect patches in a given frame in the first video by matching through frames in the second video. Cycles that connect overlapping patches together are encouraged to score higher than cycles that connect non-overlapping patches. Our experiments on the Penn Action and Pouring datasets demonstrate that the proposed method can successfully learn to correspond semantically similar patches across videos, and learns representations that are sensitive to object and action states.
收起
摘要 :
Humans have a remarkable ability to use physical commonsense and predict the effect of collisions. But do they understand the underlying factors? Can they predict if the underlying factors have changed? Interestingly, in most case...
展开
Humans have a remarkable ability to use physical commonsense and predict the effect of collisions. But do they understand the underlying factors? Can they predict if the underlying factors have changed? Interestingly, in most cases humans can predict the effects of similar collisions with different conditions such as changes in mass, friction, etc. It is postulated this is primarily because we learn to model physics with meaningful latent variables. This does not imply we can estimate the precise values of these meaningful variables (estimate exact values of mass or friction). Inspired by this observation, we propose an interpretable intuitive physics model where specific dimensions in the bottleneck layers correspond to different physical properties. In order to demonstrate that our system models these underlying physical properties, we train our model on collisions of different shapes (cube, cone, cylinder, spheres etc.) and test on collisions of unseen combinations of shapes. Furthermore, we demonstrate our model generalizes well even when similar scenes are simulated with different underlying properties.
收起
摘要 :
Humans have a remarkable ability to use physical common-sense and predict the effect of collisions. But do they understand the underlying factors? Can they predict if the underlying factors have changed? Interestingly, in most cas...
展开
Humans have a remarkable ability to use physical common-sense and predict the effect of collisions. But do they understand the underlying factors? Can they predict if the underlying factors have changed? Interestingly, in most cases humans can predict the effects of similar collisions with different conditions such as changes in mass, friction, etc. It is postulated this is primarily because we learn to model physics with meaningful latent variables. This does not imply we can estimate the precise values of these meaningful variables (estimate exact values of mass or friction). Inspired by this observation, we propose an interpretable intuitive physics model where specific dimensions in the bottleneck layers correspond to different physical properties. In order to demonstrate that our system models these underlying physical properties, we train our model on collisions of different shapes (cube, cone, cylinder, spheres etc.) and test on collisions of unseen combinations of shapes. Furthermore, we demonstrate our model generalizes well even when similar scenes are simulated with different underlying properties.
收起
摘要 :
In the real world, the degradation of images taken under haze can be quite complex, where the spatial distribution of haze varies from image to image. Recent methods adopt deep neural networks to recover clean scenes from hazy ima...
展开
In the real world, the degradation of images taken under haze can be quite complex, where the spatial distribution of haze varies from image to image. Recent methods adopt deep neural networks to recover clean scenes from hazy images directly. However, due to the generic design of network architectures and the failure in estimating an accurate haze degradation model, the generalization ability of recent dehazing methods on real-world hazy images is not ideal. To address the problem of modeling real-world haze degradation, we propose a novel Separable Hybrid Attention (SHA) module to perceive haze density by capturing positional-sensitive features in the orthogonal directions to achieve this goal. Moreover, a density encoding matrix is proposed to model the uneven distribution of the haze explicitly. The density encoding matrix generates positional encoding in a semi-supervised way - such a haze density perceiving and modeling strategy captures the unevenly distributed degeneration at the feature-level effectively. Through a suitable combination of SHA and density encoding matrix, we design a novel dehazing network architecture, which achieves a good complexity-performance tradeoff. Comprehensive evaluation on both synthetic datasets and real-world datasets demonstrates that the proposed method surpasses all the state-of-the-art approaches with a large margin both quantitatively and qualitatively.
收起
摘要 :
In winter scenes, the degradation of images taken under snow can be pretty complex, where the spatial distribution of snowy degradation varies from image to image. Recent methods adopt deep neural networks to recover clean scenes ...
展开
In winter scenes, the degradation of images taken under snow can be pretty complex, where the spatial distribution of snowy degradation varies from image to image. Recent methods adopt deep neural networks to recover clean scenes from snowy images directly. However, due to the paradox caused by the variation of complex snowy degradation, achieving reliable High-Definition image desnowing performance in real time is a considerable challenge. We develop a novel Efficient Pyramid Network with asymmetrical encoder-decoder architecture for realtime HD image desnowing. The general idea of our proposed network is to utilize the multi-scale feature flow fully and implicitly to mine clean cues from features. Compared with previous state-of-the-art desnowing methods, our approach achieves a better complexity-performance trade-off and effectively handles the processing difficulties of HD and Ultra-HD images. The extensive experiments on three large-scale image desnowing datasets demonstrate that our method surpasses all state-of-the-art approaches by a large margin both quantitatively and qualitatively, boosting the PSNR metric from 31.76 dB to 34.10 dB on the CSD test dataset and from 28.29 dB to 30.87 dB on the SRRS test dataset.
收起
摘要 :
Underwater Image Rendering aims to generate a true-to-life underwater image from a given clean one, which could be applied to various practical applications such as underwater image enhancement, camera filter, and virtual gaming. ...
展开
Underwater Image Rendering aims to generate a true-to-life underwater image from a given clean one, which could be applied to various practical applications such as underwater image enhancement, camera filter, and virtual gaming. We explore two less-touched but challenging problems in underwater image rendering, namely, i) how to render diverse underwater scenes by a single neural network? ii) how to adaptively learn the underwater light fields from natural exemplars, i,e., realistic underwater images? To this end, we propose a neural rendering method for underwater imaging, dubbed UWNR (Underwater Neural Rendering). Specifically, UWNR is a data-driven neural network that implicitly learns the natural degenerated model from authentic underwater images, avoiding introducing erroneous biases by hand-craft imaging models.Compared with existing underwater image generation methods, UWNR utilizes the natural light field to simulate the main characteristics of the underwater scene. Thus, it is able to synthesize a wide variety of underwater images from one clean image with various realistic underwater images.Extensive experiments demonstrate that our approach achieves better visual effects and quantitative metrics over previous methods. Moreover, we adopt UWNR to build an open Large Neural Rendering Underwater Dataset containing various types of water quality, dubbed LNRUD. The source code and LNRUD are available at https://github.com/Ephemeral182/UWNR.
收起